Rebase Huawei serving commits onto new-serving-2.20#51
Closed
Copilot wants to merge 7663 commits intonew-serving-2.20from
Closed
Rebase Huawei serving commits onto new-serving-2.20#51Copilot wants to merge 7663 commits intonew-serving-2.20from
Copilot wants to merge 7663 commits intonew-serving-2.20from
Conversation
PiperOrigin-RevId: 837800257
…mpliant targets `tf_profiler_pybind_cc_library_wrapper` creates a cc_header_only_target which is a target that exports all transitively exported headers - unless py_wrap is enabled. Then cc_header_only_target just creates a cc_library target which breaks the layering check. Therefore this change makes tf_profiler_pybind_cc_library_wrapper create an alias target instead of a cc_header_only_library if py_wrap is enabled. PiperOrigin-RevId: 837811846
This change replaces usages of tsl::errors::Unimplemented with absl::UnimplementedError, wrapping arguments in absl::StrCat where necessary. This addresses deprecation warnings and moves towards standard Abseil error handling. The deprecated tsl::errors::Unimplemented function was identified in third_party/tensorflow/compiler/xla/tsl/platform/errors.h. Changes: - Replaced errors::Unimplemented with absl::UnimplementedError. - Used absl::StrCat to construct error messages where necessary. PiperOrigin-RevId: 837814305
Hang was resolved at head. With a new shapes test takes ~8 seconds vs 110 seconds before. PiperOrigin-RevId: 837814726
1. If collective is degenerated, emit the memcpy thunk immediately. 2. If collective is not implementable, return status. 3. Emit collective thunk. The current logic is the same, but more convoluted without good reason. PiperOrigin-RevId: 837814909
…pace No longer triton specific, shared between GPU and CPU. PiperOrigin-RevId: 837820736
PiperOrigin-RevId: 837823432
This change replaces usages of tsl::errors::Internal with absl::InternalError, wrapping arguments in absl::StrCat where necessary. This addresses deprecation warnings and moves towards standard Abseil error handling. PiperOrigin-RevId: 837836286
…ons. PiperOrigin-RevId: 837839463
…imized module and literals. PiperOrigin-RevId: 837843890
that removes most of the code duplication and call to the gpu backend in compile. PiperOrigin-RevId: 837848792
This change replaces usages of tsl::errors::OutOfRange with absl::OutOfRangeError, wrapping arguments in absl::StrCat where necessary. This addresses deprecation warnings and moves towards standard Abseil error handling. PiperOrigin-RevId: 837859077
This is to generate more helpful error messages than failing at the IFRT op execution level, e.g., `CopyArrays` complaining about mismatching devices. PiperOrigin-RevId: 837895552
PiperOrigin-RevId: 837910560
Include data_type in ExactInterpolatorKey::operator== to correctly distinguish keys. Remove the "optonly" tag from sol_latency_estimator_test. PiperOrigin-RevId: 837912807
This change replaces usages of tsl::errors::PermissionDenied with absl::PermissionDeniedError, wrapping arguments in absl::StrCat where necessary. This addresses deprecation warnings and moves towards standard Abseil error handling. Changes: - Replaced errors::PermissionDenied with absl::PermissionDeniedError. - Used absl::StrCat to construct error messages where necessary. PiperOrigin-RevId: 837914991
…ors::InvalidArgument in xla This change replaces usages of tsl::errors::DataLoss with absl::DataLossError and tsl::errors::InvalidArgument with absl::InvalidArgumentError, wrapping arguments in absl::StrCat where necessary. This addresses deprecation warnings and moves towards standard Abseil error handling. PiperOrigin-RevId: 837916202
This change replaces usages of tsl::errors::OutOfRange with absl::OutOfRangeError, wrapping arguments in absl::StrCat where necessary. This addresses deprecation warnings and moves towards standard Abseil error handling. The deprecated tsl::errors::OutOfRange function was identified in third_party/tensorflow/compiler/xla/tsl/platform/errors.h. Changes: - Replaced errors::OutOfRange with absl::OutOfRangeError. - Used absl::StrCat to construct error messages where necessary. PiperOrigin-RevId: 838006439
PiperOrigin-RevId: 838010118
This change removes `operation_queue_id: "0"`, `wait_on_operation_queues: []`, and other fields like `force_earliest_schedule: false`, `sliding_window_length: 0`, and `force_deterministic: false` from the `backend_config` in various test HLO strings. These fields are being removed because they represent default values and do not need to be explicitly specified. PiperOrigin-RevId: 838017400
PiperOrigin-RevId: 838042897
PiperOrigin-RevId: 838042906
…and resolve nvml linker errors This change addresses the deprecation of `tsl::errors::Unimplemented` by replacing its usages with `absl::UnimplementedError`, wrapping arguments in `absl::StrCat` where necessary. This brings the code closer to standard Abseil error handling. Changes: - Replaced `errors::Unimplemented` with `absl::UnimplementedError`. - Used `absl::StrCat` to construct error messages where necessary. PiperOrigin-RevId: 838065042
… xla This change replaces usages of tsl::errors::FailedPrecondition with absl::FailedPreconditionError, wrapping arguments in absl::StrCat where necessary. This addresses deprecation warnings and moves towards standard Abseil error handling. The deprecated tsl::errors::FailedPrecondition function was identified in third_party/tensorflow/compiler/xla/tsl/platform/errors.h. Changes: - Replaced errors::FailedPrecondition with absl::FailedPreconditionError. - Used absl::StrCat to construct error messages where necessary. PiperOrigin-RevId: 838085866
PiperOrigin-RevId: 838099456
PiperOrigin-RevId: 838120757
PiperOrigin-RevId: 838127701
…teReplicated. PiperOrigin-RevId: 838135801
It looks like we have at least 2 reimplementation of GetUniqueSanitizedName. PiperOrigin-RevId: 838138583
…itter. And a lot of minor refactoring. PiperOrigin-RevId: 838151169
- Enable serving build configuration - Add BatchSizeResource class for managing batch sizes in serving workloads - Add build rules for the new batch_size_resource target - Update python toolchain configuration for serving support
Introduce DynExpr (dynamic expression) support in XLA shape data structures: - Add shape_dynexpr.h with symbolic expression algebra for dynamic dimensions - Extend xla_data.proto with expression fields for dynamic dimension values - Extend xla.proto with batch size compilation options - Update Shape class to support DynExpr annotations on dimensions - Update ShapeUtil to handle shapes with dynamic expression annotations - Add build rules for the new shape_dynexpr target
Introduce DynExpr support in TensorFlow's core shape framework: - Add tensor_shape_expr.h/cc with symbolic expression support for TF shapes - Extend tensor_shape.proto with expression fields - Update TensorShape class to support expression annotations on dimensions - Update ShapeInference to propagate dynamic expression information - Update common_shape_fns to handle dynamic expressions during shape inference
- Add XlaBatchMatcher to select optimal batch sizes for XLA compilation - Support finding the next power-of-2 batch size for efficient compilation - Add tf_xla_compile_batch_sizes flag to specify compile-time batch sizes - Add tf_xla_threshold_for_megamorphic flag for megamorphic threshold - Add tf_xla_annotate_cluster_id and cluster_single_dynamic_dim flags - Update BUILD rules for new batch matcher target
- Add OuterDimensionPropagation pass to propagate batch dimension info - Add GetOuterBatchValueSimplifier pass to simplify batch value expressions - Extend XLA ShapeInference to support dynamic expression propagation - Add xla_outer_batch_size debug option flag - Extend ExecutableRunOptions with batch size field - Update BUILD rules for new service passes - Fix layout_assignment, reduce_scatter_combiner, triangular_solve_expander and hlo_creation_utils for DynExpr compatibility
… support - Add batch size retrieval from ExecutableRunOptions in LLVM IR loops - Update llvm_loop to pass batch size as dynamic dimension in loop bounds - Update llvm_util to emit batch size value into LLVM IR - Update loop_emitter and elemental_ir_emitter for dynamic batch dimension - Update CPU IR emitter and thunk emitter to pass batch size to kernels - Add executable_run_options_offset utility for accessing batch size in IR - Update CPU kernel API builder to pass outer batch dimension - Update CPU runtime kernel to support dynamic outer batch dimension - Add disable-reduce-window and dynamic batch size support in CPU compiler - Update BUILD rules for new CPU serving utilities
- Update XlaBuilder to propagate dynamic expression annotations in shapes - Update HLO broadcast, slicing, and matrix operations for DynExpr shapes - Update HLO expanders (dot_decomposer, cholesky, eigh, qr, rng, bitcast) to preserve dynamic expression annotations during shape transformations - Update MLIR-to-HLO translation to handle DynExpr shape annotations - Update HLO pass pipeline to log dynamic expression information
Update tf2xla kernels to propagate and use dynamic expression (DynExpr) annotations when translating TF operations to XLA: - Update reshape, strided_slice, softmax, relu, reduction ops to preserve dynamic expression information during XLA lowering - Update reshape_op to handle dynamic batch dimension expressions - Update strided_slice to track dynamic dimension expressions - Update tensor_list, tensor_array, unique, and other kernels - Pass DynExpr from TF shape inference to XLA argument shapes - Add xla_compile_batch_sizes op support in xla_ops.cc - Update XlaCompiler and XlaOpKernel to thread DynExpr through compilation - Update shape_util to handle DynExpr in XLA shape conversion
- Update mark_for_compilation_pass to handle dynamic batch dimension clustering: - Add cluster_single_dynamic_dim option to limit dynamic dimensions per cluster - Exclude unranked nodes from clusters; keep output_shapes in _Arg nodes - Support tf_xla_threshold_for_megamorphic for compilation decisions - Update XlaRunOp (xla_ops.cc) to retrieve and pass batch size at runtime: - Fetch batch size from BatchSizeResource in step container - Match incoming batch to compiled shapes using XlaBatchMatcher - Handle padding and un-padding for batch-size mismatches - Update xla_launch_util to pass batch size to ExecutableRunOptions - Update encapsulate_subgraphs_pass to propagate output shape info - Update device_compiler to support batch-specific compilation caching - Update shape_inference to handle dynamic dimension expressions - Update strided_slice op and core util for DynExpr support - Update graph_properties to propagate DynExpr through grappler - Update function_ops to handle batch size in function execution - Update subgraph.cc and remapper to preserve DynExpr annotations
Copilot
AI
changed the title
[WIP] Squash commits into separate commits for review
Squash for-serving-2.20 Huawei serving commits into 9 logical commits targeting new-serving-2.20
Mar 20, 2026
Only use the Eigen-based dot product implementation when the batch dimension is dynamic, avoiding it for static shapes where the standard XLA implementation is preferable.
Extract expression inference logic into encapsulate_util.cc/h so it can be shared across encapsulation passes. This avoids duplicating the logic and makes it easier to maintain consistency across passes.
#54) Extend expression propagation to more tf2xla operators: - reshape_op: track expression changes when reshaping dimensions - reverse_sequence_op: propagate expressions through reverse_sequence - shape_op: preserve expressions when computing shape - slice_op: track expression changes for slice dimensions - split_op: propagate expressions when splitting tensors - strided_slice_op: track expression changes for strided slice
Improve padding logic in XlaRunOp to derive the values needed for padding/unpadding (value_to_pad and value_after_pad) from the dynamic expression attached to the batch dimension. This ensures accurate padding behavior when expressions are available.
Copilot
AI
changed the title
Squash for-serving-2.20 Huawei serving commits into 9 logical commits targeting new-serving-2.20
Squash for-serving-2.20 Huawei commits into logical groups targeting new-serving-2.20
Mar 20, 2026
Copilot
AI
changed the title
Squash for-serving-2.20 Huawei commits into logical groups targeting new-serving-2.20
Rebase Huawei serving commits onto new-serving-2.20
Mar 20, 2026
Copilot stopped work on behalf of
joeyye-work due to an error
March 20, 2026 16:13
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
new-serving-2.20(7,649 upstream TF commits + 1 empty "Initial plan" commit + 13 Huawei commits) instead of just 13new-serving-2.20, resolving all conflicts (all additive "take theirs" pattern)copilot/squash-rebased-onto-new-servingwith exactly 13 Huawei commits abovenew-serving-2.20copilot/squash-rebased-onto-new-servingas head branch (manual step required — GitHub doesn't allow changing head branch via API)💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.